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ABSTRACT 

Transcription activator-like effector (TALE) proteins 
can be designed to bind virtually any DNA sequence. 
General guidelines for design of TALE DNA-binding 
domains suggest that the 5 -most base of the DNA 
sequence bound by the TALE (the N 0 base) should 
be a thymine. We quantified the N 0 requirement by 
analysis of the activities of TALE transcription 
factors (TALE-TF), TALE recombinases (TALE-R) 
and TALE nucleases (TALENs) with each DNA base 
at this position. In the absence of a 5' T, we 
observed decreases in TALE activity up to > 1000- 
fold in TALE-TF activity, up to 100-fold in TALE-R 
activity and up to 10-fold reduction in TALEN 
activity compared with target sequences containing 
a 5' T. To develop TALE architectures that recognize 
all possible N 0 bases, we used structure-guided 
library design coupled with TALE-R activity selec- 
tions to evolve novel TALE N-terminal domains to 
accommodate any N 0 base. A G-selective domain 
and broadly reactive domains were isolated and 
characterized. The engineered TALE domains 
selected in the TALE-R format demonstrated modu- 
larity and were active in TALE-TF and TALEN archi- 
tectures. Evolved N-terminal domains provide 
effective and unconstrained TALE-based targeting 
of any DNA sequence as TALE binding proteins 
and designer enzymes. 

INTRODUCTION 

Transcription activator-like effector (TALE) proteins can 
be designed to bind virtually any DNA sequence of inter- 
est (1). The DNA binding sites for natural TALE tran- 
scription factors (TALE-TFs) that target plant avirulence 
genes have a 5' thymidine. (1-3) Synthetic TALE-TFs also 
have this requirement. Recent structural data indicate that 
there is an interaction between the N-terminal domain 



(NTD) and a 5' T of the target sequence. (4) A survey of 
the recent TALE nuclease (TALEN) literature yielded 
conflicting data regarding the importance of the first 
base of the target sequence, the N 0 residue. (5-8) 
Additionally, there have been no studies regarding the 
impact of the N 0 base on the activities of TALE recom- 
binases (TALE-Rs). Here, we quantified the impact of the 
N 0 base in the binding regions of TALE-Rs, TALE-TFs, 
TALE DNA-binding domains expressed as fusions with 
maltose binding protein (MBP-TALEs) and TALENs. 
Each of these TALE platforms have distinct N- and 
C-terminal architectures, but all demonstrated highest 
activity when the N 0 residue was a thymidine. To 
simplify the rules for constructing effective TALEs in 
these platforms, and allow precision genome engineering 
applications at any arbitrary DNA sequence, we devised a 
structure-guided activity selection using our recently de- 
veloped TALE-R system. Novel NTD sequences were 
identified that provided highly active and selective 
TALE-R activity on TALE binding sites with 5' G, and 
additional domain sequences were selected that permitted 
general targeting of any 5' N 0 residue. These domains were 
imported into TALE-TF, MBP-TALE and TALEN archi- 
tectures and consistently exhibited greater activity than 
did the wild-type NTD on target sequences with non-T 
5' residues. Our novel NTDs are compatible with the 
golden gate TALEN assembly protocol and now make 
possible the efficient construction of TALE transcription 
factors, recombinases, nucleases and DNA-binding 
proteins that recognize any DNA sequence allowing for 
precise and unconstrained positioning of TALE-based 
proteins on DNA without regard to the 5' T rule that 
limits most natural TALE proteins. 

MATERIALS AND METHODS 

Oligonucleotides 

Primers and other oligonucleotides (Supplementary Infor- 
mation) were ordered from Integrated DNA Technologies 
(San Diego, CA). 
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Generation of TALE-R NTD evolution plasmids 

The TALE-R system previously reported by Mercer et al. 
(9) was adapted for this study. Briefly, pBCS (containing 
chloramphenicol and carbenicillin resistance genes) was 
digested with Hindlll/Spel. The stuff er (Avr X, where X 
is the N 0 base), containing twin recombinase sites, was 
digested with Hindlll/Xbal and ligated into the vector 
to create a split beta-lactamase gene. pBCS AvrX was 
then digested with BamHl/Sacl, and Ginl27-N-stuffer- 
Avrl5 was digested with BamHl/Sacl and ligated into 
the vector to create Ginl27-N-stuffer-AviT5-X. The 
stuffer was digested with Notl/Stul for evolutions at the 
N.] TALE hairpin and Notl/Sphl for evolutions at the N 0 
TALE hairpin. 

Generation of TALE NTD evolution libraries 

Primer ptall27 Notl fwd and reverse primers KXXG lib 
rev or KXXXX lib rev were used to generate N-terminal 
variants at the N_[ TALE hairpin and were subsequently 
digested with Notl/Stul then ligated into digested 
Ginl27-AvrX. Forward primer ptall27 Notl fwd and 
reverse primer KRGG Lib Rev were used to PCR 
amplify a library with mutations in the N 0 TALE 
hairpin. This was subsequently digested with Notl/Sphl 
and ligated into Notl/Sphl-digested Ginl27-AvrX. 

TALE-R NTD evolution assay 

Round 1 ligations were ethanol precipitated and trans- 
formed into electrocompetent Top 10 F' cells then re- 
covered in SOC for 1 h. The cells were grown overnight 
in 100ml Super Broth (SB) media containing lOOug/ml 
chloramphenicol. DNA was isolated via standard proced- 
ures. The resulting plasmid DNA (Rd 1 input) was trans- 
formed into electrocompetent Topi OF' cells; cells were 
grown overnight in 100 ml of SB containing 100 ug/ml car- 
benicillin and 100ug/[il chloramphenicol. Plasmid DNA 
was isolated via standard procedures. Round 1 output 
was digested with Notl/Xbal and ligated into the 
Ginl27-AvrX vector with complementary sticky ends. 
This protocol was repeated three to four times when a 
consensus sequence was observed and clones were 
characterized. 

Measurement of N-terminal TALEN activity 

Four TALEN pairs containing each possible 5' base were 
generated using the golden gate protocol (3,10). Fusion A 
and B plasmids were directly ligated via second golden 
gate reaction into the Goldy TALEN (N A152/C +63) 
framework. The NTD was modified by digesting the 
pCAG vector with Bglll/Nsil and ligating with PCR- 
amplified NTD digested with Bglll/Nsil. TALEN pairs 
(50-75 ng each TALEN/well) were transfected into HeLa 
cells in wells of 96-well plates at a density of 1.5 x 10 4 
cells/well. After transfection, cells were placed in a 37°C 
incubator for 24 h, then were moved to 30° C for 2 days 
and then moved to 37°C for 24 h. Genomic DNA was 
isolated according to a published protocol, and DNA 
mutation rates were quantified with the Cell Surveyor 
assay and by sequencing (11). For Cell assays, genomic 



DNA was amplified by nested PCR, first with primers 
CCR5 outer fwd/CCR5 outer rev and then with CCR5 
inner fwd/CCR5 inner rev. For sequencing of indels, the 
second PCR was performed with CCR5 indel fwd/CCR5 
indel rev. Fragments were then digested with BamHl/ 
EcoRl and ligated into pUC19 with complementary 
digestion. 

TALE-TFs and luciferase assay 

Variant NTDs from the recombinase selection were PCR 
amplified with primers ptall27 SFI fwd and N-Term 
Sphl. The PCR product was amplified and digested with 
Notl/Stul and ligated into pTAL127-SFI Avrl5, which 
contains twin SF1-1 digestion sites facilitating transfer of 
the N-terminal-modified TALE from pTAL127-SFI 
Avrl5 into pcDNA 3.0 VP64. Corresponding TALE 
binding sites were cloned into the pGL3 Basic vector 
(Promega) upstream of the luciferase gene. For each 
assay, 100 ng of pcDNA was co-transfected with 5ng of 
pGL3 vector and 1 ng of pRL Renilla luciferase control 
vector into HEK293t cells in a well of a 96-well plate using 
Lipofectimine 2000 (Life Technology) according to manu- 
facturer's specifications. After 48 h, cells were washed, 
lysed and luciferase activity assessed with the Dual- 
Luciferase reporter system (Promega) on a Veritas Micro- 
plate luminometer (Turner Biosy stems). Transfections 
were done in triplicate and results averaged. 

MBP-TALE assay 

Affinity assays of MBP-TALE binding to biotinylated 
oligonucleotides were performed using the protocol 
described by Segal et al. (12). Briefly, AvrXa7 TALE 
domains were expressed from pMAL MBP-AvrXa7 
plasmid in XL 1 -Blue cells and purified on amylose resin. 
Biotinylated oligonucleotides containing the target 
AvrXa7 target site with modified 5' residues were used 
to determine TALE-binding activity in sandwich 
enzyme-linked immunosorbent assay format. Antibodies 
targeting the MBP substituent were used for assay 
development. 

RESULTS 

Preliminary analysis of the 5' T rule 

A recent crystal structure of a TALE protein bound to 
PthXo7 DNA sequence revealed a unique interaction 
between W232 in the N_i hairpin with a thymidine at the 
5' end of the contacted region of the DNA substrate (the N 0 
base) (4). This study provided a structural basis for the 
previously established 5' T rule reported when the TALE 
code was first deciphered (Figure la and b) (2). There are 
conflicting data regarding the importance of the first base 
of the target sequence of TALENs (5-8). We initially 
assessed the requirement for a 5' T in the target DNA in 
the context of TALE-Rs using four split beta lactamase 
TALE recombinase selection vectors containing four 
AvrXa7 binding sites with all possible 5' residues flanking 
a Gin32G core (Figure lc). We then evaluated recognition 
of the N 0 residue by TALE-TFs using four luciferase 
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Figure 1. The specificity of the TALE NTD. (A) Illustration of a TALE bound to its target DNA (green, RVD domain; magenta, N_i hairpin 
domain). (B) Structural analysis suggests contact of the 5' T by W232 of the N_i hairpin. This hairpin shares significant sequence homology with 
RVD hairpins. (C-F) Analyses of NT-T (wt) NTD in the context of (C) AvrXa7 TALE-R, (D) AvrXa7 TALE-TF, (E) AvrXa7 MBP-TALE and (F) 
a CCR5 targeting TALEN. (* = P<0.05, ** = ,P<0.01, *** = P< 0.001 compared with 5' T). 



reporter vectors containing a pentamer AvrXa7 promoter 
region with recognition sites containing each possible 5' 
residue (Figure Id) (9,13). With bases other than a 5' T, 
we observed decreases in activity up to > 100-fold in TALE- 
Rs and 1000-fold in TALE-TFs relative to the sequence 
with a 5' T (Figure lc and d). These reductions were 
observed despite variations in the C-terminal architectures 
of these chimeras that reportedly remove the 5' T bias, es- 
pecially in the presence of a greatly shortened C-terminal 
domain (CTD) (7,14). Enzyme-linked immunosorbent 
assay also indicated decreased affinity of MBP-TALE 
DNA-binding proteins toward target oligonucleotides 
with non-T 5' residues (Figure le). Finally, examination 
of the activity of designed TALENs with wild-type NTDs 
on targets with non-T 5' nucleotides showed up to 10-fold 
decrease in activity versus those with a 5' T (Figure If). Our 
results indicate that a 5' T is an important design parameter 
for maximally effective TALE domains in the context of 
recombinases, transcription factors, nucleases and simple 
DNA-binding proteins. 

Evolution of the TALE NTD to accommodate non-T 
5' residues 

To create a more flexible system for DNA recognition, we 
hypothesized that we could use our recently developed 



TALE-R selection system to evolve the NTD of the 
TALE to remove the 5' T constraint (Supplementary 
Scheme SI) (9). Libraries were generated with residues 
K230 through G234 randomized, and TALE-Rs with 
activity against each possible 5' base were isolated after 
several rounds of selection (Figure 2a-c). The most active 
selected clones exhibited strong conservation of K230 and 
G234; the former may contact the DNA phosphate 
backbone, and the latter may influence hairpin loop for- 
mation (Supplementary Figure S2) (4). In the case of 
library K230-W232, K230S was frequently observed but 
had much lower activity than K230R or K230 variants in 
nearly all variants assayed individually. One clone (NT-G) 
of several observed with a W232 to R232 mutation 
demonstrated a significant shift of selectivity from 5' T 
to 5' G; the sequence resembles that of the NTD of a 
recently described Ralstonia TALE protein in this 
region. The Ralstonia NTD, in the context of plant tran- 
scription factor reporter gene regulation, has been 
reported to prefer a 5' G in its substrate (see Supplemen- 
tary Figure S3 for a protein alignment) (15). Residue R232 
may contact the G base specifically, as indicated by the 
stringency of NT-G for 5' G. The preference of NT-G for 
a 5' G was comparable with the specificity of the wild-type 
domain for 5' T. We were unable to derive NTD variants 
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specific for 5' A or 5' C, but a permissive NTD, NT-aN, 
was obtained that resembles the K265-G268 N 0 hairpin 
that accepts substrates with any 5' residue and maintains 
high activity. We hypothesize that this variant makes 
enhanced non-specific contacts with the DNA phosphate 
backbone compared with the wild-type NTD, enhancing 
the overall binding of the TALE-DNA complex without 
contacting a specific 5' residue. 

We hypothesized that a shortened hairpin structure 
would allow selection of variants with specificity for 5' A 
or 5' C residues. A library with randomization at Q231- 
W232 and with residue 233 deleted was designed to 
shorten the putative DNA-binding loop. Recombinase se- 
lection revealed a highly conserved Q231Y mutation that 
had high activity in a number of clones (Figure 2d). In 
particular, NT-(3N demonstrated improved activity on 
substrates with 5' A, C and G but diminished activity on 
5' T substrates compared with TALEs with the wild-type 
NTD (Figure 2e). 

Applications of evolved TALE NTDs 

To assess the portability of the evolved NTDs in 
designer TALE fusion protein applications, optimized 
NTDs were incorporated into TALE-TFs, MBP-TALEs 
and TALENs. TALE-TFs with NT-G, NT-aN and 
NT-PN domains demonstrated 400-1500-fold increases 



in transcriptional activation of a luciferase target gene 
bearing operator sites without a 5' T residue when 
compared with the TALE-TF with the NT-T domain. 
The NT-G-based TF retained the 5' G selectivity as 
observed in the TALE-R selection system. The activities 
of NT-aN- and NT-PN-based TFs against all 5' nucleo- 
tides tracked the relative activity observed in the recom- 
binase format (Figure 3). MBP-TALEs also exhibited 
greater relative binding affinity for target oligonucleotides 
with sites that did not have a 5' T than did the wild-type 
MBP-TALE (Supplementary Figure S4), providing 
further evidence that the selected domains enhanced rec- 
ognition of or tolerance for non-thymine 5' bases. 

Four of the optimized NTDs were then imported into 
the Goldy TALEN framework (10). For these experi- 
ments, four substrates were constructed within the context 
of the A32 locus of the CCR5 gene (Figure 4a). Each 
substrate contained a different 5' residue. Experiments 
included TALENs with wild-type (NT-T) and dHax3 
NTDs (dHax3 is commonly used NTD variant isolated 
from Xanthomonas campestris) with specificity for 5' T 
(5,14,16), to benchmark gene editing activity. The sub- 
strate TALEN pairs were designed to retain as much 
RVD homology (50-90%) as possible to determine the 
activity enhancing contributions of the variant NTDs 
(Figure 4a). 
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Figure 4. Design and activity of TALEN pairs with wild-type and 
evolved NTD's with varying 5' bases. (A) The CCR5 gene expanded 
to highlight the target site for induction of the A32 mutation. (B) Gene 
editing efficiency of the wild-type (NT-T) TALEN, TALENs with 
domains optimized for non-T 5' residues, and dHax3 NTD. (C) Fold 
enhancement of the TALEN pairs with optimized NTD versus 
TALENs with 5' T specificity. The activity of each NTD is shown on 
each TALEN pair substrate. 



Activities of the TALENs were analyzed both by 
sequencing and by using the Cell assay (11). The 
selected domains exhibited increases in gene editing 
activity between 2- and 9-fold for the non-T 5' residues 
when compared with activities of the TALEN containing 
the wild-type domain (Figure 4 and Supplementary Figure 
S5). Activity was highest on TALEN pair T1/T2 with 
wild-type or dHax3 NTD. The TALEN pair substrate 
G1/G2 was processed most effectively by TALENs with 
NT- ocN, NT-(3N and NT-G, with 2.0-3.5-fold enhance- 
ment versus NT-T. NT-aN had activity 9- and 2-fold 
higher than the wild-type NT-T on TALEN pairs A1/A2 
and C1/C2, respectively. Although the impact of a 
mismatch at the 5' residue is more modest in TALENs 
than in TALE-TF and TALE-R frameworks, the 
optimized NTDs greatly improved TALEN activity 
when used in gene editing experiments. 



DISCUSSION 

Most, but not all, previous studies have suggested that 
a thymidine is required as the 5'-most residue in 
design of optimal TALE DNA-binding domains 
(3,5-7,10,13,14,17). The analyses described here indicate 
that a thymidine is optimal, and in some cases critical, for 
building functional TALE fusion proteins. This require- 
ment therefore imposes limitations on the sequences that 
can be effectively targeted with TALE transcription 
factor, nuclease and recombinase chimeras. Although 
this requirement theoretically imposes minor limitations 



on the use of TALENs for inducing gene knockout, 
given their broad spacer region tolerance, NTD's that 
can accommodate any 5' residue would further simplify 
the rules for effective TALE construction and greatly 
enhance applications requiring precise TALE placement 
for genome engineering and interrogation (e.g. precise 
cleavage of DNA at a defined base pair using TALENs, 
seamless gene insertion and exchange via TALE- 
Recombinases, displacement of natural DNA-binding 
proteins from specific endogenous DNA sequences to in- 
terrogate their functional role, the development of orthog- 
onal transcription factors for pathway engineering, the 
synergistic activation of natural and synthetic genes 
wherein transcription factor placement is key (18,19) and 
many other applications). Other uses in DNA-based nano- 
technology include decorating DNA nanostructures/ 
origami with specific DNA-binding proteins (20,21). 
Here, targeting to specific sites is constrained based on 
DNA folding/structure and thus being able to bind any 
site is critical. Elaboration of these structures and devices 
with DNA-binding proteins could be a fascinating 
approach to expanding function. Indeed, it is not difficult 
to imagine many applications for DNA binding proteins 
and their fusions when all targeting constraints are 
removed. Encouraged by these potential applications, we 
aimed to develop NTDs that enable targeting of sites 
initiated at any base. 

We used our recently developed TALE-R system to 
evolve the NTD of the TALE to remove the 5' T con- 
straint. In three rounds of selection, we obtained an 
NTD with specificity for a 5' G. Numerous selections 
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were performed in attempts to obtain variants that 
recognized either 5' A or 5' C. We inverted the G230- 
K234 hairpin, extended the K230-G234/ins232 hairpin, 
attempted modification of the K265-G268 N 0 hairpin, 
and evaluated random mutagenesis libraries. None of 
these strategies yielded NTDs with affinity for target se- 
quences with 5' A or 5' C, although we did identify an 
NTD, NT-PN, with a deletion that recognized substrates 
with both 5' A and 5' C residues with acceptable affinity. 

The strong selection preference exhibited by the NTDs 
NT-T and NT-G and the importance of W232 in NT-T 
and R232 in NT-G are likely due to specific interactions of 
these amino acids with the 5' terminal residue of the DN A 
recognition sequence. It was recently reported that the 
Ralstonia solanacearum TALE stringently requires a 5' 
G, and a sequence alignment with NT-G shows what 
appears to be a comparable N.j hairpin containing an 
arginine at the position analogous to 232 in NT-G 
(Supplementary Figure S3). Owing to the high structural 
homology between the NTDs Brgll and NT-T, it may be 
possible to modify the preference of the Ralstonia TALE 
NTD to thymine by a simple arginine to tryptophan 
mutation or to eliminate specificity by grafting NT-aN 
or NT-(3N domains into this related protein. It is also 
interesting to note that arginine-guanine interactions are 
common in evolved zinc finger domains (22). 

The variant NTDs selected were successfully imported 
into TALE-TFs, MBP-TALEs and TALENs and gener- 
ally conferred the activity and specificity expected based 
on data from the recombinase evolution system. TALE- 
TFs with optimized NTDs enhanced TALE activation 
between 400- and 1500-fold relative to the activity of 
NT-T against AvrXa7 promoter sites with non-T 5' 
residues. When incorporated into TALENs, our NTD 
with non-T selectivity enhanced activity 2-9-fold relative 
to that of the NT-T domain on substrates with 5' A, C or 
G. The increases in TALEN gene editing generally 
correlated with increases in activity observed in TALE-R 
and TALE-TF constructs. The specificity and high activity 
of NT-G was maintained, as evidenced by the lower 
activity in assays with TALEN pairs A1/A2, C1/C2, and 
T1/T2, and the generally high activity of NT-aN and NT- 
(3N was also imparted into the TALEN A152/+63 
architecture. 

It was recently reported that alternatively truncated 
TALEs with synthetic TALE RVD domains do not 
require a 5' T in the DNA substrate (7). We constructed 
the reported A 143, +47 truncation as a Goldy TALE-TF 
and observed substantially lower activity on the AvrXa7 
substrate than we observed with the A 127, +95 trunca- 
tion, which has been most commonly used by others 
and which is the truncation set used in our study 
(Supplementary Figure S6) (7,14). Thus, the difference in 
reported outcomes could be due to the truncated architec- 
tures used. 

In summary, we confirmed the importance of a 5' thy- 
midine in the DNA substrate for binding and activity of 
designed TALEs in the context of TALE-R, TALE-TF, 
MBP-TALEs and TALEN chimeras. Targeted mutagen- 
esis and TALE-R selection were applied to engineer 
TALE NTDs that recognize bases other than thymine as 



the 5' most base of the substrate DNA. The engineered 
TALE domains developed here demonstrated modularity 
and were highly active in TALE-TF and TALEN archi- 
tectures. These novel NTDs expand by ~1 5-fold the 
number of sites that can be targeted by current 
TALE-Rs, which have strict geometric requirements on 
their binding sites and which are highly sensitive to the 
identity of the N 0 base (9). Furthermore, they now allow 
for the precise placement of TALE DBDs and TALE-TFs 
at any DNA sequence to facilitate gene regulation, 
displacement of endogenous DNA-binding proteins and 
synthetic biology applications where precise binding 
might be key. Although TALENs based on the native 
NTD show varying degrees of tolerance of N 0 base sub- 
stitutions, our data indicate that the novel NTDs reported 
here also facilitate higher efficiency gene editing with any 
N 0 base as compared with natural NTD-based TALENs. 
With the removal of all sequence targeting restraints from 
TALE-based proteins, we envision the ever-expansive use 
of this technology in genome engineering, synthetic 
biology, medicine and nanotechnology. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online. 
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